Creating Custom Network Packet Processing Pipelines on HMC-Enabled FPGAs

نویسندگان

Jehandad Khan

Peter Athanas

چکیده

A higher tier of network packet processing performance can be achieved by augmenting the agility that FPGAs offer with the sheer streaming throughput offered by a Hybrid Memory Cube (HMC). The notion of a programmable data plane specified in a domain-specific language enables the creation of custom protocol and packet operations. This paper presents an effort to map one such domain specific language, namely P4 (Protocol Independent Packet Processing) to an HMC-enabled FPGA platform by using HLS as the intermediate representation. The use of HLS affords productivity advantages as well as enabling a close correlation between the P4 code and the corresponding hardware units required to achieve the functionality. The resulting code leverages the parallel nature of the HMC to yield a packet pipeline capable of delivering 30 million packets per second (Mpps) using only a single HMC channel, which translates to 30 Gbps of data throughput for a Layer-3 router with an average of 128-byte long packets. Demonstrated here, by using 10 HMC user channels, the system is capable of achieving 300 Mpps (or 300 Gbps) of throughput. I. HMC AS A PACKET LOOKUP MEMORY The Micron Hybrid Memory Cube consists of multiple memory die stacked together and interconnected using through silicon via (TSV) technology. The bottom layer of the stack consists of a controller that controls the transactions to the memory above. The hallmark of the HMC is its fast multiport throughput for random memory access. Unlike traditional memory interfaces, which rely on on-device caching hierarchies to exploit locality of access for performance, the HMC can deliver an order of magnitude higher random access performance without local caching. High random access throughput and the ability to make atomic operations make HMC well suited for performing packet look-ups in a networking context. Network packet processing requires the maintenance of little state between packets, which makes it an embarrassingly parallel operation. While table sizes, even for millions of entries, take only 100s of MBs, the random access rate makes memory performance critical. Since there is little to no correlation between individual packets, memory access for flow look-up does not follow any pattern. This characteristic, coupled with the dependency between flow rules, makes it difficult to explore on-chip cache lines or to require complex logic to make the use of off-chip memory possible. The traditional solution to this problem is the use of dedicated TCAM devices, which are expensive and power hungry and, therefore, less than ideal. This makes the HMC a good alternative for increasing table scale by offering off-chip memory access without compromising on throughput. The parallel nature of the HMC fits nicely in this scenario, while the ability to perform atomic updates presents interesting possibilities for maintaining packet counts, flow hit counts, and other statistics. Multiple parallel interfaces exposed by the HMC controller may be mapped to either different mapping stages or different physical interfaces coming into the FPGA. Prevalent research in packet processing using FPGAs have been restricted to the use of on-chip memory due to the trade-off between table scale and performance. On-chip memory in such scenarios is already stressed due to the stringent requirements for buffers in store-and-forward architectures. These also have a bearing on the maximum frequency of the design. Moreover, this puts a hard limit on the maximum possible table size. While the latency of the HMC may become an issue in some applications, the same effect may be offset using either speculative issue in a packet pipeline or the use of intelligent caching mechanisms.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Overlay Architectures for FPGA-Based Software Packet Processing

Overlay Architectures for FPGA-Based Software Packet Processing Martin Labrecque Doctor of Philosophy Graduate Department of Electrical and Computer Engineering University of Toronto 2011 Packet processing is the enabling technology of networked information systems such as the Internet and is usually performed with fixed-function custom-made ASIC chips. As communication protocols evolve rapidly...

متن کامل

Platform and Methodology for Teaching Design of Hardware Modules in Internet Routers and Firewalls

An instructional platform has been developed that allows rapid prototype of network packet processing functions in hardware. This platform, called the Field Programmable Port Extender (FPX), enables engineering students to rapidly prototype and implement components for use in an Internet router or firewall. Customized circuits allow networking equipment to increase the throughout and enhance fu...

متن کامل

Framework for Application Mapping over Packet-Switched Network of FPGAs: Case Studies

The algorithm-to-hardware High-level synthesis (HLS) tools today are purported to produce hardware comparable in quality to handcrafted designs, particularly with user directive driven or domains specific HLS. However, HLS tools are not readily equipped for when an application/algorithm needs to scale. We present a (work-in-progress) semi-automated framework to map applications over a packet-sw...

متن کامل

Handel-C Design of a Packet Processing Device for platform-FPGAs

Handel-C is a convenient tool for high-level design for FPGAs. Platform FPGAs, with high-levels of integration are suitable as packet processing devices. An existing ATM packet-switching design has been augmented with a CAM, with a simplified control structure, by the addition of a back-pressure mechanism, and a new shared-memory buffering scheme. The paper gives details of the Handel-C impleme...

متن کامل

Design and Implementation of a String Matching System for Network Intrusion Detection using FPGA-based low power multiple-hashing Bloom Filters

Modern Network Intrusion Detection Systems (NIDS) inspect the network packet payload to check if it conforms to the security policies of the given network. This process, often referred to as deep packet inspection, involves detection of predefined signature strings or keywords starting at an arbitrary location in the payload. String matching is a computationally intensive task and can become a ...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2017

Creating Custom Network Packet Processing Pipelines on HMC-Enabled FPGAs

نویسندگان

چکیده

منابع مشابه

Overlay Architectures for FPGA-Based Software Packet Processing

Platform and Methodology for Teaching Design of Hardware Modules in Internet Routers and Firewalls

Framework for Application Mapping over Packet-Switched Network of FPGAs: Case Studies

Handel-C Design of a Packet Processing Device for platform-FPGAs

Design and Implementation of a String Matching System for Network Intrusion Detection using FPGA-based low power multiple-hashing Bloom Filters

عنوان ژورنال:

اشتراک گذاری